An Optimized K-Nearest Neighbor Algorithm for Large Scale Hierarchical Text Classification

نویسندگان

  • Xiaogang Han
  • Junfa Liu
  • Zhiqi Shen
  • Chunyan Miao
چکیده

In this paper, an optimized k nearest neighbor algorithm for the 2nd edition of the Large Scale Hierarchical Text Classification Pascal Challenge was summarized. Firstly, we perform k-NN algorithm on the datasets to obtain the top-k nearest neighbors for each testing documents. Secondly, several critical category-neighbors features were identified and the impact of each of those features were estimated through cross-validation. Finally, the categories prediction algorithm utilizes the optimal parameters for the category-neighbors features to predict the categories for the testing documents. The experiments performed on the three datasets for the challenge show that the classifier can get high accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...

متن کامل

A Modular k-Nearest Neighbor Classification Method for Massively Parallel Text Categorization

This paper presents a Min-Max modular k-nearest neighbor (M-k-NN) classification method for massively parallel text categorization. The basic idea behind the method is to decompose a large-scale text categorization problem into a number of smaller two-class subproblems and combine all of the individual modular k-NN classifiers trained on the smaller two-class subproblems into an M-k-NN classifi...

متن کامل

Neighbor-weighted K-nearest neighbor for unbalanced text corpus

Text categorization or classification is the automated assigning of text documents to pre-defined classes based on their contents. Many of classification algorithms usually assume that the training examples are evenly distributed among different classes. However, unbalanced data sets often appear in many practical applications. In order to deal with uneven text sets, we propose the neighbor-wei...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011